Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 347
Filtrar
1.
Genome Biol Evol ; 16(3)2024 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-38340334

RESUMO

Fishes of the genus Carassius are useful experimental vertebrate models for the study of evolutionary biology and cytogenetics. Carassius demonstrates diverse biological characteristics, such as variation in ploidy levels and chromosome numbers, and presence of microchromosomes. Those Carassius polyploids with ≥150 chromosomes have microchromosomes, but the origin of microchromosomes, especially in European populations, is unknown. We used cytogenetics to study evolution of tandem repeats (U1 and U2 small nuclear DNAs and H3 histone) and microchromosomes in Carassius from the Czech Republic. We tested the hypotheses whether the number of tandem repeats was affected by polyploidization or divergence between species and what mechanism drives evolution of microchromosomes. Tandem repeats were found in tetraploid and hexaploid Carassius gibelio, and tetraploid Carassius auratus and Carassius carassius in conserved numbers, with the exception of U1 small nuclear DNA in C. auratus. This conservation indicates reduction and/or loss in the number of copies per locus in hexaploids and may have occurred by divergence rather than polyploidization. To study the evolution of microchromosomes, we used the whole microchromosome painting probe from hexaploid C. gibelio and hybridized it to tetraploid and hexaploid C. gibelio, and tetraploid C. auratus and C. carassius. Our results revealed variation in the number of microchromosomes in hexaploids and indicated that the evolution of the Carassius karyotype is governed by macrochromosome fissions followed by segmental duplication in pericentromeric areas. These are potential mechanisms responsible for the presence of microchromosomes in Carassius hexaploids. Differential efficacy of one or both of these mechanisms in different tetraploids could ensure variability in chromosome number in polyploids in general.


Assuntos
Cyprinidae , Duplicações Segmentares Genômicas , Animais , Tetraploidia , Análise Citogenética , Sequências de Repetição em Tandem , Poliploidia
2.
Mol Biol Evol ; 41(3)2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38376487

RESUMO

The blue whale, Balaenoptera musculus, is the largest animal known to have ever existed, making it an important case study in longevity and resistance to cancer. To further this and other blue whale-related research, we report a reference-quality, long-read-based genome assembly of this fascinating species. We assembled the genome from PacBio long reads and utilized Illumina/10×, optical maps, and Hi-C data for scaffolding, polishing, and manual curation. We also provided long read RNA-seq data to facilitate the annotation of the assembly by NCBI and Ensembl. Additionally, we annotated both haplotypes using TOGA and measured the genome size by flow cytometry. We then compared the blue whale genome with other cetaceans and artiodactyls, including vaquita (Phocoena sinus), the world's smallest cetacean, to investigate blue whale's unique biological traits. We found a dramatic amplification of several genes in the blue whale genome resulting from a recent burst in segmental duplications, though the possible connection between this amplification and giant body size requires further study. We also discovered sites in the insulin-like growth factor-1 gene correlated with body size in cetaceans. Finally, using our assembly to examine the heterozygosity and historical demography of Pacific and Atlantic blue whale populations, we found that the genomes of both populations are highly heterozygous and that their genetic isolation dates to the last interglacial period. Taken together, these results indicate how a high-quality, annotated blue whale genome will serve as an important resource for biology, evolution, and conservation research.


Assuntos
Balaenoptera , Neoplasias , Animais , Balaenoptera/genética , Duplicações Segmentares Genômicas , Genoma , Demografia , Neoplasias/genética
3.
New Phytol ; 242(2): 610-625, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38402521

RESUMO

Many pathogens evolved compartmentalized genomes with conserved core and variable accessory regions (ARs) that carry effector genes mediating virulence. The fungal plant pathogen Fusarium oxysporum has such ARs, often spanning entire chromosomes. The presence of specific ARs influences the host range, and horizontal transfer of ARs can modify the pathogenicity of the receiving strain. However, how these ARs evolve in strains that infect the same host remains largely unknown. We defined the pan-genome of 69 diverse F. oxysporum strains that cause Fusarium wilt of banana, a significant constraint to global banana production, and analyzed the diversity and evolution of the ARs. Accessory regions in F. oxysporum strains infecting the same banana cultivar are highly diverse, and we could not identify any shared genomic regions and in planta-induced effectors. We demonstrate that segmental duplications drive the evolution of ARs. Furthermore, we show that recent segmental duplications specifically in accessory chromosomes cause the expansion of ARs in F. oxysporum. Taken together, we conclude that extensive recent duplications drive the evolution of ARs in F. oxysporum, which contribute to the evolution of virulence.


Assuntos
Fusarium , Genoma Fúngico , Duplicações Segmentares Genômicas , Fusarium/genética , Especificidade de Hospedeiro , Genômica , Doenças das Plantas/genética , Doenças das Plantas/microbiologia
4.
Int J Mol Sci ; 24(21)2023 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-37958807

RESUMO

The impact of segmental duplications on human evolution and disease is only just starting to unfold, thanks to advancements in sequencing technologies that allow for their discovery and precise genotyping. The 15q11-q13 locus is a hotspot of recurrent copy number variation associated with Prader-Willi/Angelman syndromes, developmental delay, autism, and epilepsy and is mediated by complex segmental duplications, many of which arose recently during evolution. To gain insight into the instability of this region, we characterized its architecture in human and nonhuman primates, reconstructing the evolutionary history of five different inversions that rearranged the region in different species primarily by accumulation of segmental duplications. Comparative analysis of human and nonhuman primate duplication structures suggests a human-specific gain of directly oriented duplications in the regions flanking the GOLGA cores and HERC segmental duplications, representing potential genomic drivers for the human-specific expansions. The increasing complexity of segmental duplication organization over the course of evolution underlies its association with human susceptibility to recurrent disease-associated rearrangements.


Assuntos
Transtorno Autístico , Síndrome de Prader-Willi , Animais , Humanos , Variações do Número de Cópias de DNA/genética , Primatas/genética , Síndrome de Prader-Willi/genética , Duplicações Segmentares Genômicas/genética , Transtorno Autístico/genética , Cromossomos Humanos Par 15/genética , Duplicação Gênica
5.
Genome Biol ; 24(1): 205, 2023 09 11.
Artigo em Inglês | MEDLINE | ID: mdl-37697406

RESUMO

Resolving complex genomic regions rich in segmental duplications (SDs) is challenging due to the high error rate of long-read sequencing. Here, we describe a targeted approach with a novel genome assembler PhaseDancer that extends SD-rich regions of interest iteratively. We validate its robustness and efficiency using a golden-standard set of human BAC clones and in silico-generated SDs with predefined evolutionary scenarios. PhaseDancer enables extension of the incomplete complex SD-rich subtelomeric regions of Great Ape chromosomes orthologous to the human chromosome 2 (HSA2) fusion site, informing a model of HSA2 formation and unravelling the evolution of human and Great Ape genomes.


Assuntos
Hominidae , Humanos , Animais , Hominidae/genética , Duplicações Segmentares Genômicas , Telômero , Genômica , Cromossomos Humanos
6.
Nature ; 621(7978): 344-354, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37612512

RESUMO

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.


Assuntos
Cromossomos Humanos Y , Genômica , Análise de Sequência de DNA , Humanos , Sequência de Bases , Cromossomos Humanos Y/genética , DNA Satélite/genética , Variação Genética/genética , Genética Populacional , Genômica/métodos , Genômica/normas , Heterocromatina/genética , Família Multigênica/genética , Padrões de Referência , Duplicações Segmentares Genômicas/genética , Análise de Sequência de DNA/normas , Sequências de Repetição em Tandem/genética , Telômero/genética
7.
Genome Biol ; 24(1): 157, 2023 07 04.
Artigo em Inglês | MEDLINE | ID: mdl-37403156

RESUMO

BACKGROUND: The first telomere-to-telomere (T2T) human genome assembly (T2T-CHM13) release is a milestone in human genomics. The T2T-CHM13 genome assembly extends our understanding of telomeres, centromeres, segmental duplication, and other complex regions. The current human genome reference (GRCh38) has been widely used in various human genomic studies. However, the large-scale genomic differences between these two important genome assemblies are not characterized in detail yet. RESULTS: Here, in addition to the previously reported "non-syntenic" regions, we find 67 additional large-scale discrepant regions and precisely categorize them into four structural types with a newly developed website tool called SynPlotter. The discrepant regions (~ 21.6 Mbp) excluding telomeric and centromeric regions are highly structurally polymorphic in humans, where the deletions or duplications are likely associated with various human diseases, such as immune and neurodevelopmental disorders. The analyses of a newly identified discrepant region-the KLRC gene cluster-show that the depletion of KLRC2 by a single-deletion event is associated with natural killer cell differentiation in ~ 20% of humans. Meanwhile, the rapid amino acid replacements observed within KLRC3 are probably a result of natural selection in primate evolution. CONCLUSION: Our study provides a foundation for understanding the large-scale structural genomic differences between the two crucial human reference genomes, and is thereby important for future human genomics studies.


Assuntos
Genoma Humano , Genômica , Animais , Humanos , Duplicações Segmentares Genômicas , Família Multigênica , Centrômero/genética , Subfamília C de Receptores Semelhantes a Lectina de Células NK/genética
8.
Bioinformatics ; 39(39 Suppl 1): i279-i287, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387146

RESUMO

MOTIVATION: Low-copy repeats (LCRs) or segmental duplications are long segments of duplicated DNA that cover > 5% of the human genome. Existing tools for variant calling using short reads exhibit low accuracy in LCRs due to ambiguity in read mapping and extensive copy number variation. Variants in more than 150 genes overlapping LCRs are associated with risk for human diseases. METHODS: We describe a short-read variant calling method, ParascopyVC, that performs variant calling jointly across all repeat copies and utilizes reads independent of mapping quality in LCRs. To identify candidate variants, ParascopyVC aggregates reads mapped to different repeat copies and performs polyploid variant calling. Subsequently, paralogous sequence variants that can differentiate repeat copies are identified using population data and used for estimating the genotype of variants for each repeat copy. RESULTS: On simulated whole-genome sequence data, ParascopyVC achieved higher precision (0.997) and recall (0.807) than three state-of-the-art variant callers (best precision = 0.956 for DeepVariant and best recall = 0.738 for GATK) in 167 LCR regions. Benchmarking of ParascopyVC using the genome-in-a-bottle high-confidence variant calls for HG002 genome showed that it achieved a very high precision of 0.991 and a high recall of 0.909 across LCR regions, significantly better than FreeBayes (precision = 0.954 and recall = 0.822), GATK (precision = 0.888 and recall = 0.873) and DeepVariant (precision = 0.983 and recall = 0.861). ParascopyVC demonstrated a consistently higher accuracy (mean F1 = 0.947) than other callers (best F1 = 0.908) across seven human genomes. AVAILABILITY AND IMPLEMENTATION: ParascopyVC is implemented in Python and is freely available at https://github.com/tprodanov/ParascopyVC.


Assuntos
Variações do Número de Cópias de DNA , Duplicações Segmentares Genômicas , Humanos , Sequenciamento Completo do Genoma , Benchmarking , Genoma Humano
9.
Genome Res ; 33(4): 496-510, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-37164484

RESUMO

There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.


Assuntos
DNA Satélite , Polimorfismo Genético , Humanos , DNA Satélite/genética , Haplótipos , Duplicações Segmentares Genômicas , Análise de Sequência de DNA
10.
Nature ; 617(7960): 325-334, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37165237

RESUMO

Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data1,2. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions3,4. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have 'relocated' on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences5,6.


Assuntos
Conversão Gênica , Mutação , Duplicações Segmentares Genômicas , Humanos , Conversão Gênica/genética , Genoma Humano/genética , Polimorfismo de Nucleotídeo Único/genética , Haplótipos/genética , Éxons/genética , Citosina/química , Guanina/química , Ilhas de CpG/genética
11.
Genome Med ; 15(1): 35, 2023 05 10.
Artigo em Inglês | MEDLINE | ID: mdl-37165454

RESUMO

BACKGROUND: High sequence identity between segmental duplications (SDs) can facilitate copy number variants (CNVs) via non-allelic homologous recombination (NAHR). These CNVs are one of the fundamental causes of genomic disorders such as the 3q29 deletion syndrome (del3q29S). There are 21 protein-coding genes lost or gained as a result of such recurrent 1.6-Mbp deletions or duplications, respectively, in the 3q29 locus. While NAHR plays a role in CNV occurrence, the factors that increase the risk of NAHR at this particular locus are not well understood. METHODS: We employed an optical genome mapping technique to characterize the 3q29 locus in 161 unaffected individuals, 16 probands with del3q29S and their parents, and 2 probands with the 3q29 duplication syndrome (dup3q29S). Long-read sequencing-based haplotype resolved de novo assemblies from 44 unaffected individuals, and 1 trio was used for orthogonal validation of haplotypes and deletion breakpoints. RESULTS: In total, we discovered 34 haplotypes, of which 19 were novel haplotypes. Among these 19 novel haplotypes, 18 were detected in unaffected individuals, while 1 novel haplotype was detected on the parent-of-origin chromosome of a proband with the del3q29S. Phased assemblies from 44 unaffected individuals enabled the orthogonal validation of 20 haplotypes. In 89% (16/18) of the probands, breakpoints were confined to paralogous copies of a 20-kbp segment within the 3q29 SDs. In one del3q29S proband, the breakpoint was confined to a 374-bp region using long-read sequencing. Furthermore, we categorized del3q29S cases into three classes and dup3q29S cases into two classes based on breakpoints. Finally, we found no evidence of inversions in parent-of-origin chromosomes. CONCLUSIONS: We have generated the most comprehensive haplotype map for the 3q29 locus using unaffected individuals, probands with del3q29S or dup3q29S, and available parents, and also determined the deletion breakpoint to be within a 374-bp region in one proband with del3q29S. These results should provide a better understanding of the underlying genetic architecture that contributes to the etiology of del3q29S and dup3q29S.


Assuntos
Genômica , Duplicações Segmentares Genômicas , Humanos , Mapeamento Cromossômico , Síndrome , Haplótipos , Variações do Número de Cópias de DNA
12.
PLoS One ; 18(2): e0266234, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36800354

RESUMO

Ehrlichia ruminantium is a tick-borne intracellular pathogen of ruminants that causes heartwater, a disease present in Sub-saharan Africa, islands in the Indian Ocean and the Caribbean, inducing significant economic losses. At present, three avirulent strains of E. ruminantium (Gardel, Welgevonden and Senegal isolates) have been produced by a process of serial passaging in mammalian cells in vitro, but unfortunately their use as vaccines do not offer a large range of protection against other strains, possibly due to the genetic diversity present within the species. So far no genetic basis for virulence attenuation has been identified in any E. ruminantium strain that could offer targets to facilitate vaccine production. Virulence attenuated Senegal strains have been produced twice independently, and require many fewer passages to attenuate than the other strains. We compared the genomes of a virulent and attenuated Senegal strain and identified a likely attenuator gene, ntrX, a global transcription regulator and member of a two-component system that is linked to environmental sensing. This gene has an inverted partial duplicate close to the parental gene that shows evidence of gene conversion in different E. ruminantium strains. The pseudogenisation of the gene in the avirulent Senegal strain occurred by gene conversion from the duplicate to the parent, transferring a 4 bp deletion which is unique to the Senegal strain partial duplicate amongst the wild isolates. We confirmed that the ntrX gene is not expressed in the avirulent Senegal strain by RT-PCR. The inverted duplicate structure combined with the 4 bp deletion in the Senegal strain can explain both the attenuation and the faster speed of attenuation in the Senegal strain relative to other strains of E. ruminantium. Our results identify nrtX as a promising target for the generation of attenuated strains of E. ruminantium by random or directed mutagenesis that could be used for vaccine production.


Assuntos
Ehrlichia ruminantium , Animais , Ehrlichia ruminantium/genética , Conversão Gênica , Senegal , Virulência/genética , Duplicações Segmentares Genômicas , Ruminantes/genética
13.
Genes (Basel) ; 13(11)2022 11 11.
Artigo em Inglês | MEDLINE | ID: mdl-36421776

RESUMO

LCR22s are among the most complex loci in the human genome and are susceptible to nonallelic homologous recombination. This can lead to a variety of genomic disorders, including deletions, duplications, and translocations, of which the 22q11.2 deletion syndrome is the most common in humans. Interrogating these phenomena is difficult due to the high complexity of the LCR22s and the inaccurate representation of the LCRs across different reference genomes. Optical mapping techniques, which provide long-range chromosomal maps, could be used to unravel the complex duplicon structure. These techniques have already uncovered the hypervariability of the LCR22-A haplotype in the human population. Although optical LCR22 mapping is a major step forward, long-read sequencing approaches will be essential to reach nucleotide resolution of the LCR22s and map the crossover sites. Accurate maps and sequences are needed to pinpoint potential predisposing alleles and, most importantly, allow for genotype-phenotype studies exploring the role of the LCR22s in health and disease. In addition, this research might provide a paradigm for the study of other rare genomic disorders.


Assuntos
Síndrome de DiGeorge , Duplicações Segmentares Genômicas , Humanos , Síndrome de DiGeorge/genética , Genoma Humano
14.
Genes (Basel) ; 13(11)2022 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-36360320

RESUMO

The most frequent microdeletion, 22q11.2 deletion syndrome (22q11.2DS), has a wide and variable phenotype that causes difficulties in diagnosis. 22q11.2DS is a contiguous gene syndrome, but due to the existence of several low-copy-number repeat sequences (LCR) it displays a high variety of deletion types: typical deletions LCR A-D-the most common (~90%), proximal deletions LCR A-B, central deletions (LCR B, C-D) and distal deletions (LCR D-E, F). METHODS: We conducted a retrospective study of 59 22q11.2SD cases, with the aim of highlighting phenotype-genotype correlations. All cases were tested using MLPA combined kits: SALSA MLPA KIT P245 and P250 (MRC Holland). RESULTS: most cases (76%) presented classic deletion LCR A-D with various severity and phenotypic findings. A total of 14 atypical new deletions were identified: 2 proximal deletions LCR A-B, 1 CES (Cat Eye Syndrome region) to LCR B deletion, 4 nested deletions LCR B-D and 1 LCR C-D, 3 LCR A-E deletions, 1 LCR D-E, and 2 small single gene deletions: delDGCR8 and delTOP3B. CONCLUSIONS: This study emphasizes the wide phenotypic variety and incomplete penetrance of 22q11.2DS. Our findings contribute to the genotype-phenotype data regarding different types of 22q11.2 deletions and illustrate the usefulness of MLPA combined kits in 22q11.2DS diagnosis.


Assuntos
Síndrome de DiGeorge , Humanos , Síndrome de DiGeorge/genética , Duplicações Segmentares Genômicas , Estudos Retrospectivos , Estudos de Associação Genética
15.
Genes (Basel) ; 13(9)2022 09 17.
Artigo em Inglês | MEDLINE | ID: mdl-36140835

RESUMO

The most prevalent microdeletion in the human population occurs at 22q11.2, a region rich in chromosome-specific low copy repeats (LCR22s). The structure of this region has eluded characterization due to a combination of size, regional complexity, and haplotype diversity. To further complicate matters, it is not well represented in the human reference genome. Most individuals with 22q11.2 deletion syndrome (22q11.2DS) carry a de novo, hemizygous deletion approximately 3 Mbp in size occurring by non-allelic homologous recombination (NAHR) mediated by the LCR22s. The ability to fully delineate an individual's 22q11.2 regional structure will likely be important for studies designed to assess an unaffected individual's risk for generating rearrangements in germ cells, potentially leading to offspring with 22q11.2DS. Towards understanding these risk factors, optical mapping has been previously employed to successfully elucidate the structure and variation of LCR22s across 30 families affected by 22q11.2DS. The father in one of these families carries a t(11;22)(q23;q11) translocation. Surprisingly, it was determined that he is the parent-of-deletion-origin. NAHR, which occurred between his der(22) and intact chromosome 22, led to a 22q11.2 deletion in his affected child. The unaffected sibling of the proband with 22q11.2DS inherited the father's normal chromosome 22, which did not aberrantly recombine. This unexpected observation definitively shows that haplotypes that engage in NAHR can also be inherited intact. This study is the first to identify all structures involving a rearranged chromosome 22 that also participates in NAHR leading to a 22q11.2 deletion.


Assuntos
Síndrome de DiGeorge , Alelos , Criança , Síndrome de DiGeorge/genética , Recombinação Homóloga/genética , Humanos , Masculino , Pais , Duplicações Segmentares Genômicas , Translocação Genética/genética
16.
Nat Commun ; 13(1): 3221, 2022 06 09.
Artigo em Inglês | MEDLINE | ID: mdl-35680869

RESUMO

The human genome contains hundreds of low-copy repeats (LCRs) that are challenging to analyze using short-read sequencing technologies due to extensive copy number variation and ambiguity in read mapping. Copy number and sequence variants in more than 150 duplicated genes that overlap LCRs have been implicated in monogenic and complex human diseases. We describe a computational tool, Parascopy, for estimating the aggregate and paralog-specific copy number of duplicated genes using whole-genome sequencing (WGS). Parascopy is an efficient method that jointly analyzes reads mapped to different repeat copies without the need for global realignment. It leverages multiple samples to mitigate sequencing bias and to identify reliable paralogous sequence variants (PSVs) that differentiate repeat copies. Analysis of WGS data for 2504 individuals from diverse populations showed that Parascopy is robust to sequencing bias, has higher accuracy compared to existing methods and enables prioritization of pathogenic copy number changes in duplicated genes.


Assuntos
Variações do Número de Cópias de DNA , Genoma Humano , Variações do Número de Cópias de DNA/genética , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Duplicações Segmentares Genômicas , Análise de Sequência de DNA/métodos , Sequenciamento Completo do Genoma/métodos
17.
Cell ; 185(11): 1986-2005.e26, 2022 05 26.
Artigo em Inglês | MEDLINE | ID: mdl-35525246

RESUMO

Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversions <2 kbp form by twin-priming during L1 retrotransposition; 80% of the larger inversions are balanced and affect twice as many nucleotides as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7 × 10-4 per locus per generation. Recurrent inversions exhibit a sex-chromosomal bias and co-localize with genomic disorder critical regions. We propose that inversion recurrence results in an elevated number of heterozygous carriers and structural SD diversity, which increases mutability in the population and predisposes specific haplotypes to disease-causing CNVs.


Assuntos
Inversão Cromossômica , Duplicações Segmentares Genômicas , Inversão Cromossômica/genética , Variações do Número de Cópias de DNA/genética , Genoma Humano , Genômica , Humanos
18.
Mol Biol Evol ; 39(5)2022 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-35574660

RESUMO

Segmental duplications (SDs) constitute a considerable fraction of primate genomes. They contribute to genetic variation and provide raw material for evolution. Groups of SDs are characterized by the presence of shared core duplicons. One of these core duplicons, low copy repeat (lcr)16a, has been shown to be particularly active in the propagation of interspersed SDs in primates. The underlying mechanisms are, however, only partially understood. Alu short interspersed elements (SINEs) are frequently found at breakpoints and have been implicated in the expansion of SDs. Detailed analysis of lcr16a-containing SDs shows that the hominid-specific SVA (SINE-R-VNTR-Alu) retrotransposon is an integral component of the core duplicon in Asian and African great apes. In orang-utan, it provides breakpoints and contributes to both interchromosomal and intrachromosomal lcr16a mobility by inter-element recombination. Furthermore, the data suggest that in hominines (human, chimpanzee, gorilla) SVA recombination-mediated integration of a circular intermediate is the founding event of a lineage-specific lcr16a expansion. One of the hominine lcr16a copies displays large flanking direct repeats, a structural feature shared by other SDs in the human genome. Taken together, the results obtained extend the range of SVAs' contribution to genome evolution from RNA-mediated transduction to DNA-based recombination. In addition, they provide further support for a role of circular intermediates in SD mobilization.


Assuntos
Hominidae , Duplicações Segmentares Genômicas , Animais , Evolução Molecular , Genoma Humano , Hominidae/genética , Humanos , Primatas/genética , Retroelementos
19.
Genes (Basel) ; 13(5)2022 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-35627290

RESUMO

Intragenic segmental duplication regions are potential hotspots for recurrent copy number variation and possible pathogenic aberrations. Two large sarcomeric genes, nebulin and titin, both contain such segmental duplication regions. Using our custom Comparative Genomic Hybridisation array, we have previously shown that a gain or loss of more than one copy of the repeated block of the nebulin triplicate region constitutes a recessive pathogenic mutation. Using targeted array-CGH, similar copy number variants can be detected in the segmental duplication region of titin. Due to the limitations of the array-CGH methodology and the repetitiveness of the region, the exact copy numbers of the blocks could not be determined. Therefore, we developed complementary custom Droplet Digital PCR assays for the titin segmental duplication region to confirm true variation. Our combined methods show that the titin segmental duplication region is subject to recurrent copy number variation. Gains and losses were detected in samples from healthy individuals as well as in samples from patients with different muscle disorders. The copy number variation observed in our cohort is likely benign, but pathogenic copy number variants in the segmental duplication region of titin cannot be excluded. Further investigations are needed, however, this region should no longer be neglected in genetic analyses.


Assuntos
Variações do Número de Cópias de DNA , Duplicações Segmentares Genômicas , Conectina/genética , Variações do Número de Cópias de DNA/genética , Genômica , Humanos , Proteínas Musculares , Reação em Cadeia da Polimerase , Duplicações Segmentares Genômicas/genética
20.
Nat Methods ; 19(6): 705-710, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35365778

RESUMO

Approximately 5-10% of the human genome remains inaccessible due to the presence of repetitive sequences such as segmental duplications and tandem repeat arrays. We show that existing long-read mappers often yield incorrect alignments and variant calls within long, near-identical repeats, as they remain vulnerable to allelic bias. In the presence of a nonreference allele within a repeat, a read sampled from that region could be mapped to an incorrect repeat copy. To address this limitation, we developed a new long-read mapping method, Winnowmap2, by using minimal confidently alignable substrings. Winnowmap2 computes each read mapping through a collection of confident subalignments. This approach is more tolerant of structural variation and more sensitive to paralog-specific variants within repeats. Our experiments highlight that Winnowmap2 successfully addresses the issue of allelic bias, enabling more accurate downstream variant calls in repetitive sequences.


Assuntos
Genoma Humano , Sequências Repetitivas de Ácido Nucleico , Alelos , Humanos , Sequências Repetitivas de Ácido Nucleico/genética , Duplicações Segmentares Genômicas , Análise de Sequência de DNA , Sequências de Repetição em Tandem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...